-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADR 006: Block Propagation with Rows #434
Conversation
Note to me: Add references. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in order to merge this ADR, slightly more details are needed. The main purpose as discussed is to get a good overview of the required changes so we can triage this and make an informed decision if we want to do this now or later (without just yet going through the whole process of implementation, testing, reviews etc).
Why only rows, and not columns too? |
It does not make much sense to broadcast both during consensus. It would only increase bandwidth requirements further and introduce more complexity on the sending and receiving end with no real benefits. Or what would those be? The purpose of block data propagation during consensus is only to gossip the data to validators (and tendermint full nodes). If a block proposer withholds the data during consensus, no validator would sign off the block and no light client would ever see that block. |
What is the relationship between RowSet and DAHeader? RowSet isn't a commitment to the row data right? Only DAHeader is? |
This ADR is, Besides after a block proposal, blocks also need to be gossiped after a validator goes back online during replay, and the during fast-sync. While the block's We could keep the edit: the above two paragraphs are not true and can be ignored There's also the fact that Despite these things, I still think |
@evan-forbes, so I double-checked on that and
What does that control give us? My intuition is that dynamic gossiped chunk is better than fixed. The gossiped chunk size dynamicity proportional to block size will affect propagation positively. Also, in such a case, we always get an equal chunk, while in the case of Parts, the last chunk always has unpredictable size. I would say that even for debugging any issues Rows would give us more control and precise info. Just by seeing the block size 128x128 somewhere in the proposal stage, you can immediately understand how many messages would be sent their size. |
@musalbas, RowSet is a helper structure that wraps DAHeader and tracks received Rows with their integrity against DAHeader and tells its user when the block is complete and/or can be recovered. Mostly it is a helper and is not a high-level concept.
Right
Yes |
@Wondertan You're right, I was very mistaken, the only time blocks are gossiped in parts is during proposal. That means that we could implement the full proposed change faster than I originally thought.
While the current implementation uses a constant for the part size, we could easily change that to be dynamic should it be beneficial for some reason. The current implementation would give us more flexibility into the exact size chosen, but I don't have any intuition on whether this would actually be needed or not. As mentioned in an earlier comment, a single row of the max extended square size would be identical to |
Since the data we're committing to in the DAHeader is erasure coded, I wonder if can also do erasure coded block gossiping similar to this or this. This should be possible by extending this ADR so that the extended rows are gossiped too, and once the node receives enough rows, it can reconstruct the entire block by itself. The efficiency comes from the fact that the node needs any n of 2n rows to recover the block, rather than all n rows. |
I think the preferred way of moving forward with this:
Note that 1. and 2. are independent of 3 and only necessary that the blocks produced and finalized by validators only contain a single commitment to the data in the header (data root in the header commiting to the DAHeader). This is the bare minimum changes we should definitely do before launch! Decoupling the implementation detail on how data is gossiped from the BlockID in the header should actually make the implementation of 3. a bit easier to achieve. (also we should consider upstreaming this as the PartSetHeader should not be in the vanilla tendermint header too). Then, 4. is a bit more involved and while it certainly is the most beautiful and clean approach. Also, we might need to go down the route of 4. anyways in case the naive block propagation in tendermint does not work well with really large blocks. Similarly, this could also be upstreamed would the implementation turn out to be either of similar performance (but more robust in the light of malicious peers/network failures) or faster even (and still more robust). 3. and 4. could ofc also combined into one ADR instead but this certainly requires much more details. Does this make sense to everyone involved? |
1. what it would take to remove the PartSetHeader from BlockID 2. keep gossiping as is vs change gossiping Conclusion 1. could be finished in a few days (really): most time will be spent fixing tests. It will create a lot of changes but if carefully committed, it should still be reviewable. Need clarity if we should remove the DAHeader from the Proposal until we switch to gossiping e.g. to rows / erasured data etc. ref #434 #184)
@musalbas, Yeah, but that would also require us to send a whole extended sqaure, while recovery would happen by rows only(4x more data, but only 1D recovery). I agree that we should go ideally with the erasure coding approach, but to make this fully efficient for our 2D case, we should also repair by columns, not just by rows, thus we need share-level gossiping. Share level gossiping was discussed here, but we are currently not interested in that as not "pragmatic". I don't think that implementing share-level gossiping using tm's gossiping would be a good idea both now and in general, but just rows with actual data will fit ideally there instead of existing Parts, so I would keep this ADR focused on that and live without erasure coding in consensus unless we decide to rework consensus profoundly(If I had a choice to do so, I would have done it earlier than later). |
You don't need to gossip the entire extended square, just the first half of all the rows (i.e. a vertical rectangle). This is sufficient to recover all the original rows. I'm not talking about share gossiping, only row gossiping. I think share gossiping is unnecessary as the shares are too small to be gossiped individually. |
@MusAlba, so in the case of a vertical rectangle, we don't need to send extended rows anymore and only columns, right? |
Can you elaborate? What’s the problem with gossiping smaller chunks? Currently, we pull request shares(in storage and DA node cases), and if comparing with gossiping those, the latter seems a much better approach, because 256 bytes are very little for one request of data. Also, doesn't share-level gossiping establish Data Availability properties for gossiping? |
Let's look at the extended square:
What @musalbas is suggesting is to gossip
during consensus (and row by row). Alternatively, this would also work:
but the downside that (at least with the current rsmt2d implementation) you'd have to erasure code twice to get the full square ( |
The smaller the chunks, the more we create just overhead (on the network level as well as computational overhead on the receiving end). There is likely a sweet spot for chunk size. It seems like the 64KB perform well in practice for instance. IMO we should better understand the tradeoffs between fixed-size chunks and rows (dynamic sized) if we decide to change the gossiping mechanism (and before the implementation). |
@liamsi, I understand your argument, and in case sending Merkle proof is needed, then share-level gossiping might be not a good idea. However, your message states sending Merkle proofs along as a certainty, while it is not clear why we need that in the first place. The fact that Tendermint sends those with parts, does mean that this applies to us in the same way. DAHeader of any form should be sufficient to validate Rows at least. Note that I do not have a strong opinion about share-level gossiping in general, while I do have a strong opinion that in scalable p2p networks every node/peer should care only about the data of its own interest and the share-level gossiping model fits in here best. And that's not only about DAS validators, but the entire network.
Totally agree, but share-level gossiping is not part of this ADR and is just a side conversation, not a proposal. I think we can close the share-level gossiping discussion for now, as it is not a priority, mostly 'wonderting'. Though we still need to go on with this ADR and as discussed, we can wait for experiments to go on implementation and then look at the results. Also, you've mentioned Merkle Proofs that I removed from gossiping in this ADR. In the case of DAHeader, they are not needed and provide us with additional bandwidth savings. What's your opinion on that? |
Yeah, that is a good point. Sending along the proofs with the shares is a mechanism to detect as early as possible if someone is sending you garbage (while sacrificing some bandwidth). IMO, it only makes sense in the case of larger block sizes (which is not the case in many / most tendermint-based applications). In case of large blocks, you would not want to download a lot of data until you know if someone is sending you made-up data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Otherwise looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thing: the lazy-adr
directory was renamed to just adr
. Can you rebase your changes on the lates master (or merge in master) and move adr 006 into the correct directory as well?
Rebasing should work, ok |
Co-authored-by: Ismail Khoffi <[email protected]>
Co-authored-by: Ismail Khoffi <[email protected]>
Co-authored-by: Evan Forbes <[email protected]>
Co-authored-by: Evan Forbes <[email protected]>
Co-authored-by: Evan Forbes <[email protected]>
Co-authored-by: Ismail Khoffi <[email protected]>
Co-authored-by: Ismail Khoffi <[email protected]>
Co-authored-by: Ismail Khoffi <[email protected]>
5dbcf3a
to
79385cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience @Wondertan. I feel the discussions in this PR as well as the ADR itself will become relevant again in the future 👍🏼
Me either 🚀 |
Follow-up to #427. The workflow in #427 is merely informational, which is helpful, but can be bypassed. It's probably best to enforce this check when we push tags. If the check fails, it will prevent the cutting of a release, meaning we'll have to delete the tag, fix the version number, and tag again. --- #### PR checklist - [ ] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] Updated relevant documentation (`docs/` or `spec/`) and code comments (cherry picked from commit 0d3c2f3) Co-authored-by: Thane Thomson <[email protected]>
Rendered